{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z-6LLOPZouLg"
   },
   "source": [
    "# LoRAアダプタの読み込み\n",
    "\n",
    "このノートブックでは、事前学習されたモデルにLoRAアダプタを読み込む方法を示します。LoRA（低ランク適応）は、事前学習されたモデルの重みを固定し、学習可能なランク分解行列をモデルの層に注入するパラメータ効率の良い微調整技術です。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fXqd9BXgouLi"
   },
   "source": [
    "## 1. 開発環境のセットアップ\n",
    "\n",
    "最初のステップは、Hugging FaceライブラリとPytorchをインストールすることです。`trl`、`transformers`、`datasets`を含みます。`trl`について聞いたことがない場合でも心配ありません。これは`transformers`と`datasets`の上に構築された新しいライブラリで、微調整、RLHF、オープンLLMの調整を容易にします。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "tKvGVxImouLi"
   },
   "outputs": [],
   "source": [
    "# Google Colabでの要件のインストール\n",
    "# !pip install transformers datasets trl huggingface_hub\n",
    "\n",
    "# Hugging Faceへの認証\n",
    "\n",
    "from huggingface_hub import login\n",
    "\n",
    "login()\n",
    "\n",
    "# 便利のため、Hugging Faceのトークンを環境変数HF_TOKENとして設定できます"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XHUzfwpKouLk"
   },
   "source": [
    "## 2. モデルとアダプタの読み込み"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "z4p6Bvo7ouLk"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Downloading (…)okenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 28.0kB/s]\n",
       "Downloading (…)lve/main/config.json: 100%|██████████| 1.11k/1.11k [00:00<00:00, 1.11MB/s]\n",
       "Downloading (…)olve/main/vocab.json: 100%|██████████| 1.06M/1.06M [00:00<00:00, 1.06MB/s]\n",
       "Downloading (…)olve/main/merges.txt: 100%|██████████| 525k/525k [00:00<00:00, 525kB/s]\n",
       "Downloading (…)/main/tokenizer.json: 100%|██████████| 2.13M/2.13M [00:00<00:00, 2.13MB/s]\n",
       "Downloading (…)/main/adapter_config.json: 100%|██████████| 472/472 [00:00<00:00, 472kB/s]\n",
       "Downloading (…)/adapter_model.bin: 100%|██████████| 1.22G/1.22G [00:10<00:00, 120MB/s]\n",
       "Downloading pytorch_model.bin: 100%|██████████| 1.22G/1.22G [00:10<00:00, 120MB/s]\n"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# モデルとアダプタの読み込み\n",
    "from transformers import AutoModelForCausalLM\n",
    "from peft import PeftModel\n",
    "\n",
    "base_model = AutoModelForCausalLM.from_pretrained(\"HuggingFaceTB/SmolLM2-135M\")\n",
    "peft_model_id = \"HuggingFaceTB/SmolLM2-135M-LoRA\"\n",
    "model = PeftModel.from_pretrained(base_model, peft_model_id)\n",
    "model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9TOhJdtsouLk"
   },
   "source": [
    "## 3. モデルのテスト\n",
    "\n",
    "モデルが正しく読み込まれたことを確認するために、いくつかのサンプル入力でテストします。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "z4p6Bvo7ouLk"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Downloading (…)okenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 28.0kB/s]\n",
       "Downloading (…)lve/main/config.json: 100%|██████████| 1.11k/1.11k [00:00<00:00, 1.11MB/s]\n",
       "Downloading (…)olve/main/vocab.json: 100%|██████████| 1.06M/1.06M [00:00<00:00, 1.06MB/s]\n",
       "Downloading (…)olve/main/merges.txt: 100%|██████████| 525k/525k [00:00<00:00, 525kB/s]\n",
       "Downloading (…)/main/tokenizer.json: 100%|██████████| 2.13M/2.13M [00:00<00:00, 2.13MB/s]\n"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# トークナイザーの読み込み\n",
    "from transformers import AutoTokenizer\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"HuggingFaceTB/SmolLM2-135M\")\n",
    "\n",
    "# サンプル入力の生成\n",
    "inputs = tokenizer(\"こんにちは、元気ですか？\", return_tensors=\"pt\")\n",
    "outputs = model.generate(**inputs, max_new_tokens=50)\n",
    "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
